Under newfound regulatory guidelines in Amsterdam, the Airbnb market is increasingly challenging to navigate. As of July 2020, the City of Amsterdam outright banned Airbnbs in three of Amsterdam’s city center neighborhoods, added limits to the number of nights that a host could rent out their property, limited the number of guests that a property can host, and enacted a permitting fee. Checkbnb, an affilate of Airbnb, allows prospective hosts to calculate the potential revenue of renting their property given the property’s home specifications, listing details, and access to various amenities and disamenities in the surrounding area. The provided model leverages existing Amsterdam Airbnb data from 2018 to inform and construct a model that generalizes to today’s rental market. The model output, potential revenue, gives users an accurate understanding of the possible financial gains that can result from renting out their home on the Airbnb platform. Given the challenging regulatory landscape, Checkbnb seeks to simplify the decision-making process for perspective hosts through an easy-to-use platform and accurate model.
The data used in our final Checkbnb model was sourced from Airbnb-provided data on 2018 listings in Amsterdam, open data provided by the City of Amsterdam, and Open Street Map (OSM) data that crowd-sources locations of various landmarks and amenities across the city. The Airbnb data consists of information on listing price, number of bedrooms and bathrooms, as well as other data on reviews and available amenities to home guests. The city’s data includes information on locations of city-provided services and neighborhood features, as well as information on land use types. Lastly, the OSM data provides other locational data on city amenities and features.
#Setting the bounding box to pull in OSM data
xmin = st_bbox(districts)[[1]]
ymin = st_bbox(districts)[[2]]
xmax = st_bbox(districts)[[3]]
ymax = st_bbox(districts)[[4]]
# Bars
bar <- opq(bbox = c(xmin, ymin, xmax, ymax)) %>%
add_osm_feature(key = 'amenity', value = c("bar", "biergarten", "pub")) %>%
osmdata_sf()
bar <-
bar$osm_points %>%
.[districts,]
# Restaurants
restaurant <- opq(bbox = c(xmin, ymin, xmax, ymax)) %>%
add_osm_feature(key = 'amenity', value = c("restaurant", "cafe")) %>%
osmdata_sf()
restaurant <-
restaurant$osm_points %>%
.[districts,]
university <- opq(bbox = c(xmin, ymin, xmax, ymax)) %>%
add_osm_feature(key = 'amenity', value = c("university", "college")) %>%
osmdata_sf()
university <-
university$osm_points %>%
.[districts,]
## Schools
schools <- opq(bbox = c(xmin, ymin, xmax, ymax)) %>%
add_osm_feature(key = 'amenity', value = c("school")) %>%
osmdata_sf()
schools <-
schools$osm_points %>%
.[districts,]
## land use - retail
retail <- opq(bbox = c(xmin, ymin, xmax, ymax)) %>%
add_osm_feature(key = 'landuse', value = c("retail")) %>%
osmdata_sf()
retail <-
retail$osm_points %>%
.[districts,]
## stadiums/sports centers
stadiums <- opq(bbox = c(xmin, ymin, xmax, ymax)) %>%
add_osm_feature(key = 'leisure', value = c("stadium")) %>%
osmdata_sf()
stadiums <-
stadiums$osm_points %>%
.[districts,]
## Parks
parks_osm <- opq(bbox = c(xmin, ymin, xmax, ymax)) %>%
add_osm_feature(key = 'leisure', value = c("park")) %>%
osmdata_sf()
parks_osm <-
parks_osm$osm_points %>%
.[districts,]
## industrial buildings
industrial <- opq(bbox = c(xmin, ymin, xmax, ymax)) %>%
add_osm_feature(key = 'building', value = c("industrial")) %>%
osmdata_sf()
industrial <-
industrial$osm_points %>%
.[districts,]
Before engineering any additional features for our model, we reviewed data available from the Airbnb dataset. Figure 1 below maps the locations and prices of Amsterdam Airbnbs in 2018. With regard to the spatial process of rental price and number of Airbnbs, there’s a higher concentration of Airbnb rentals in the more central part of the city, with fewer units extending outward. Higher prices are also more concentrated in the city center.
Figures 2 and 3 show some of the categorical and numerical features included in the Airbnb dataset and their relationship to price. We can see that, on average, price is higher when units have real beds and offer the entire home to the renter as opposed to a single bedroom. Some neighborhoods, including Buiksloterham located on the northern part of the canal, have higher prices on average and property types like serviced apartments and villas also attract higher priced rentals. The numerical features indicate a positive relationship between price and the number of beds, baths, the number of people the home can accommodate, and how often the home is available during the year.
#######################
# Neighborhood Plot
######################
listings_details.sf <- listings_details %>%
st_as_sf(coords = c("longitude", "latitude"), crs = 4326, agr = "constant") %>%
st_transform(st_crs(neighborhoods_new.sf))
listings_details.sf$price2 = as.numeric(gsub("\\$", "", listings_details.sf$price))
ggplot() +
geom_sf(data = neighborhoods_new.sf, fill = "#2f4550") +
geom_sf(data = listings_details.sf, aes(colour = q5(price2)),
show.legend = "point", size = .1) +
scale_colour_manual(values = paletteorngs,
labels = qBr(listings_details.sf, "price2"),
name = "Nightly Airbnb Price\n(Quintile Breaks)") +
labs(title="Nightly Airbnb Price, Amsterdam",
subtitle = "Figure 1") +
mapTheme()
#taking out outlier property types
listings_details = filter(listings_details,
property_type != "Lighthouse" &
property_type != "Earth House" &
property_type != "Nature lodge" &
property_type != "Castle" &
property_type != "Tent" &
property_type != "Campsite")
listings_details %>%
dplyr::select(price2, property_type, room_type, neighbourhood, bed_type) %>%
gather(Variable, Value, -price2) %>%
ggplot(aes(Value, price2)) +
geom_bar(position = "dodge", stat = "summary", fun.y = "mean", fill ="#7fc3dc", col="#1a81a2", alpha = 0.9 ) +
facet_wrap(~Variable, ncol = 1, scales = "free") +
labs(title = "Price as a function of categorical variables", y = "Mean Price", subtitle = "Figure 2") +
plotTheme() + theme(axis.text.x = element_text(angle = 45, size=20, hjust = 1),
axis.text.y = element_text(size = 20),
plot.title = element_text(size = 30),
plot.subtitle = element_text(size = 20),
axis.title.x = element_text(size = 20),
axis.title.y = element_text(size = 20))
st_drop_geometry(listings_details.sf) %>%
dplyr::select(price2, accommodates, bedrooms, bathrooms, availability_365) %>%
filter(price2 <= 1000000) %>%
gather(Variable, Value, -price2) %>%
ggplot(aes(Value, price2)) +
geom_point(shape = 16, size = 3,color= "#cde1b1", alpha = 0.5) + geom_smooth(method = "lm", se=F, colour = "#E86E23") +
facet_wrap(~Variable, ncol = 4, scales = "free") +
labs(title = "Price as a function of numerical variables", subtitle = "Figure 3") +
plotTheme() + theme(plot.title = element_text(size = 30),
plot.subtitle = element_text(size = 20),
axis.title.x = element_text(size = 20),
axis.title.y = element_text(size = 20))
Apart from some of the features in the original Airbnb dataset, we engineered a series of variables for our final model with the goal of minimizing error in rental price. We wanted to ensure our model was both accurate and generalizable in predicting a range of property types and locations.
We engineered variables based on descriptions that Airbnb hosts have posted about their home and analyzed them to see if there were keywords associated with higher or lower prices. We leveraged the nearest neighbor method to look at access to various services and amenities in the city like swimming areas and markets. We calculated lagPrice for each listing, which is the average price of the three closest homes to each Airbnb, and local Moran’s I, which provides distance to the closest highly significant cluser of high priced homes.
We first analyzed the descriptions associated with each listing to determine key words associated with higher or lower prices. We in turn created dummy variables associated with different price points.
listings_details.sf$luxury <- ifelse(grepl("luxur", ignore.case=TRUE, listings_details.sf$name), "yes", "no")
listings_details.sf$canal <- ifelse(grepl("canal", ignore.case=TRUE, listings_details.sf$name), "yes", "no")
listings_details.sf$expamen <- ifelse(grepl("view|terrace|spac|rooftop|loft|roof", ignore.case=TRUE, listings_details.sf$name), "yes", "no")
listings_details.sf$expcodes <- ifelse(grepl("family|big|light|heart|large|design|pijp|jordaan", ignore.case=TRUE, listings_details.sf$name), "yes", "no")
listings_details.sf$citycenterdesc <- ifelse(grepl("center|centre",ignore.case=TRUE, listings_details.sf$name), "yes", "no")
listings_details.sf$cheapcodes <- ifelse(grepl("cozy|cosy|free|room|little|bed|garden|vondelpark", ignore.case=TRUE, listings_details.sf$name), "yes", "no")
listings_details.sf$pool <- ifelse(grepl("Pool", ignore.case=TRUE, listings_details.sf$amenities), "yes", "no")
listings_details.sf$expamencat <- ifelse(grepl("friendly|detector|workspace|hot|water|parking|private|first|aid|greets|luggage|wide|kit|linens|dropoff|balcony|step-free|books|Wide|dryer|laptop|access|premises", ignore.case=TRUE, listings_details.sf$amenities), "yes", "no")
listings_details.sf$expsummary <- ifelse(grepl("located|minutes|walk|bars|view|center|away|museum|tram", ignore.case=TRUE, listings_details.sf$summary), "yes", "no")
listings_details.sf$expdescrip <- ifelse(grepl("kitchen|located|floor|garden|walk|large|double|open|centre|beautiful|center|terrace|fully|canal|big|shops|two|close|view|also|famous|bright|", ignore.case=TRUE, listings_details.sf$description), "yes", "no")
listings_details.sf$expneighbdesc <- ifelse(grepl("walk|city| shops|bars|nice|just|museum|distance|close|min|Jordaan|best|local|Anne|popular|trendy|Gogh|Pijp|quiet|meters|right|one|park|Amstel|Cuyp", ignore.case=TRUE, listings_details.sf$summary), "yes", "no")
listings_details.sf$expneighwoclou<- ifelse(grepl("Pijp|Plantage|Westelijke|Zeeburg|Zeeheldenbuurt|Lastage|Weesperbuurt|Oud-Zuid|", ignore.case=TRUE, listings_details.sf$summary), "yes", "no")
For a series of variables from OSM and Amsterdam’s open data portal, we calculated the nearest neighbor distance from each listing to the below (dis)amenities and services around the city.
st_c <- st_coordinates
## Metro Stops
metro_stops.sf <- metro_stops%>%
st_transform(st_crs(districts.sf)) %>%
st_as_sf()
metro_stops.sf <- st_join(metro_stops.sf, districts.sf, join = st_intersects, left = FALSE)
listings_details.sf <-
listings_details.sf %>%
mutate(
metrostops_nn2 = nn_function(st_c(st_centroid(listings_details.sf)), st_c(st_centroid(metro_stops.sf)), 2))
## Swimming Areas
swim.sf <- swim%>%
st_transform(st_crs(districts.sf)) %>%
st_as_sf()
swim.sf <- st_join(swim.sf, districts.sf, join = st_intersects, left = FALSE)
listings_details.sf <-
listings_details.sf %>%
mutate(
swim_nn1 = nn_function(st_c(st_centroid(listings_details.sf)), st_c(st_centroid(swim.sf)), 1))
## Wall Art
wall_art.sf <- wall_art%>%
st_transform(st_crs(districts.sf)) %>%
st_as_sf()
metro_stops.sf <- st_join(wall_art.sf, districts.sf, join = st_intersects, left = FALSE)
listings_details.sf <-
listings_details.sf %>%
mutate(
wallart_nn3 = nn_function(st_c(st_centroid(listings_details.sf)), st_c(st_centroid(wall_art.sf)), 1))
## Markets
markets.sf <- markets%>%
st_transform(st_crs(districts.sf)) %>%
st_as_sf()
markets.sf <- st_join(markets.sf, districts.sf, join = st_intersects, left = FALSE)
listings_details.sf <-
listings_details.sf %>%
mutate(
markets_nn1 = nn_function(st_c(st_centroid(listings_details.sf)), st_c(st_centroid(markets.sf)), 1))
## Playgrounds
playgrounds.sf <- playgrounds %>%
st_transform(st_crs(districts.sf)) %>%
st_as_sf()
playgrounds.sf <- st_join(playgrounds.sf, districts.sf, join = st_intersects, left = FALSE)
listings_details.sf <-
listings_details.sf %>%
mutate(
playgrounds_nn2 = nn_function(st_c(st_centroid(listings_details.sf)), st_c(st_centroid(playgrounds.sf)), 2))
## Student Housing
students.sf <- student_housing %>%
st_transform(st_crs(districts.sf)) %>%
st_as_sf()
students.sf <- st_join(students.sf, districts.sf, join = st_intersects, left = FALSE)
listings_details.sf <-
listings_details.sf %>%
mutate(
students_nn2 = nn_function(st_c(st_centroid(listings_details.sf)), st_c(st_centroid(students.sf)), 2))
## Historic buildings
hist_build.sf <- hist_build %>%
st_transform(st_crs(districts.sf)) %>%
st_as_sf()
hist_build.sf <- st_join(hist_build.sf, districts.sf, join = st_intersects, left = FALSE)
listings_details.sf <-
listings_details.sf %>%
mutate(
histbuild_nn3 = nn_function(st_c(st_centroid(listings_details.sf)), st_c(st_centroid(hist_build.sf)), 3))
## Monumuments
monuments.sf <- monuments %>%
st_transform(st_crs(districts.sf)) %>%
st_as_sf()
monuments.sf <- st_join(monuments.sf, districts.sf, join = st_intersects, left = FALSE)
listings_details.sf <-
listings_details.sf %>%
mutate(
monuments_nn3 = nn_function(st_c(st_centroid(listings_details.sf)), st_c(st_centroid(monuments.sf)), 3))
## Bars
bar.sf <- bar %>%
st_transform(st_crs(districts.sf)) %>%
st_as_sf()
bar.sf <- st_join(bar.sf, districts.sf, join = st_intersects, left = FALSE)
listings_details.sf <-
listings_details.sf %>%
mutate(
bar_nn2 = nn_function(st_c(st_centroid(listings_details.sf)), st_c(st_centroid(bar.sf)), 2))
## Restaurants
restaurant.sf <- restaurant %>%
st_transform(st_crs(districts.sf)) %>%
st_as_sf()
restaurant.sf <- st_join(restaurant.sf, districts.sf, join = st_intersects, left = FALSE)
listings_details.sf <-
listings_details.sf %>%
mutate(
restaurant_nn3 = nn_function(st_c(st_centroid(listings_details.sf)), st_c(st_centroid(restaurant.sf)), 3))
## University
university.sf <- university %>%
st_transform(st_crs(districts.sf)) %>%
st_as_sf()
university.sf <- st_join(university.sf, districts.sf, join = st_intersects, left = FALSE)
listings_details.sf <-
listings_details.sf %>%
mutate(
university_nn1 = nn_function(st_c(st_centroid(listings_details.sf)), st_c(st_centroid(university.sf)), 1))
## Schools
schools.sf <- schools %>%
st_transform(st_crs(districts.sf)) %>%
st_as_sf()
schools.sf <- st_join(schools.sf, districts.sf, join = st_intersects, left = FALSE)
listings_details.sf <-
listings_details.sf %>%
mutate(
schools_nn2 = nn_function(st_c(st_centroid(listings_details.sf)), st_c(st_centroid(schools.sf)), 2))
## Retail
retail.sf <- retail %>%
st_transform(st_crs(districts.sf)) %>%
st_as_sf()
retail.sf <- st_join(retail.sf, districts.sf, join = st_intersects, left = FALSE)
listings_details.sf <-
listings_details.sf %>%
mutate(
retail_nn3 = nn_function(st_c(st_centroid(listings_details.sf)), st_c(st_centroid(retail.sf)), 3))
## Industrial
industrial.sf <- industrial %>%
st_transform(st_crs(districts.sf)) %>%
st_as_sf()
industrial.sf <- st_join(industrial.sf, districts.sf, join = st_intersects, left = FALSE)
listings_details.sf <-
listings_details.sf %>%
mutate(
industrial_nn1 = nn_function(st_c(st_centroid(listings_details.sf)), st_c(st_centroid(industrial.sf)), 1))
We eliminated outlier property types and calculated the average price of the three closest rental properties for each listing.
listings_details.sf <- subset(listings_details.sf, price2 > 0)
listings_details.sf = filter(listings_details.sf,
property_type != "Lighthouse" &
property_type != "Earth house" &
property_type != "Nature lodge" &
property_type != "Castle" &
property_type != "Tent" &
property_type != "Campsite" &
property_type != "Barn")
k_nearest_neighbors = 3
#prices
coords <- st_coordinates(listings_details.sf)
# k nearest neighbors
neighborList <- knn2nb(knearneigh(coords, k_nearest_neighbors))
spatialWeights <- nb2listw(neighborList, style="W")
listings_details.sf$lagPrice <- lag.listw(spatialWeights, listings_details.sf$price2)
Figure 4 shows more localized clustering of higher priced homes along to the canal as well as just north of the canal. Figure 5 identifies the significant hotspots of high-priced rentals, which we then transformed into polygons and calculated distance from each rental property to its closest significant fishnet cell.
# Create fishnet
fishnet <-
st_make_grid(neighborhoods_new.sf, cellsize = 200) %>%
st_sf() %>%
mutate(uniqueID = rownames(.))
# Create fishnet with the average price of the rental properties located in each grid cell
price_net <-
dplyr::select(listings_details.sf) %>%
mutate(price = listings_details.sf$price2) %>%
aggregate(., fishnet, mean) %>%
mutate(price = replace_na(price, 0),
uniqueID = rownames(.),
cvID = sample(round(nrow(fishnet) / 24), size=nrow(fishnet), replace = TRUE))
price_net <- subset(price_net, price > 0)
ggplot() +
geom_sf(data=fishnet, fill = "grey40") +
geom_sf(data = price_net, aes(fill = price)) +
scale_fill_viridis() +
labs(title = "Average Rental Price", subtitle = "Figure 4") +
mapTheme()
# Local Moran's I
## Create neighbor list and spatial weights matrix
final_net.nb <- poly2nb(as_Spatial(price_net), queen=TRUE)
final_net.weights <- nb2listw(final_net.nb, style="W", zero.policy=TRUE)
# Combining the price_net with localmoran test
final_net.localMorans <-
cbind(
as.data.frame(localmoran(price_net$price, final_net.weights)),
as.data.frame(price_net)) %>%
st_sf() %>%
dplyr::select(Avg_Price = price,
Local_Morans_I = Ii,
P_Value = `Pr(z > 0)`) %>%
mutate(Significant_Hotspots = ifelse(P_Value <= 0.001, 1, 0)) %>%
gather(Variable, Value, -geometry)
vars <- unique(final_net.localMorans$Variable)
varList <- list()
for(i in vars){
varList[[i]] <-
ggplot() +
geom_sf(data = fishnet, fill = "grey40") +
geom_sf(data = filter(final_net.localMorans, Variable == i),
aes(fill = Value), colour=NA) +
scale_fill_viridis(name="") +
labs(title=i, subtitle = "Figure 5") +
mapTheme() + theme(legend.position="bottom")}
do.call(grid.arrange,c(varList, ncol = 4, top = "Local Morans I statistics, Price"))
sig_net <-
dplyr::filter(final_net.localMorans, final_net.localMorans $ Variable == "Significant_Hotspots")
sig_net <-
dplyr::filter(sig_net, sig_net $ Value == 1)
sig_net.sf <- sig_net %>%
st_as_sf(coords = "geometry", crs = 4326, agr = "constant") %>%
st_transform('EPSG:28992')
listings_details.sf <-
listings_details.sf %>%
mutate(
sig_cell = nn_function(st_c(st_centroid(listings_details.sf)), st_c(st_centroid(sig_net.sf)), 1))
We manipulated three of our numerical variables and one categorical variables and transformed them to improve their predictive power in the model.
listings_details.sf <-
listings_details.sf %>%
mutate(size_of_group =
case_when(guests_included <= 4 ~ "Large Group",
guests_included <= 2 & (guests_included) > 4 ~ "Small Group",
guests_included >= 1 ~ "Solo"))
listings_details.sf <-
listings_details.sf %>%
mutate(expneighbors =
case_when(lagPrice <= 90 ~ "Cheap Neighbors",
lagPrice <= 250 & (lagPrice) > 90 ~ "Average Neighbors",
lagPrice >= 251 ~ "Expensive Neighbors"))
listings_details.sf <-
listings_details.sf %>%
mutate(hotspotlevel =
case_when(sig_cell <= 368 ~ "Not Hot Spot",
sig_cell <= 1500 & (sig_cell) > 369 ~ "Semi-Hot",
sig_cell >= 1501 ~ "Hot Spot"))
entirehomelist <- list("Entire home/apt"
)
listings_details.sf <- listings_details.sf %>%
mutate(Entire_Home = room_type %in% entirehomelist)
listings_details.sf$Entire_Home <- ifelse(listings_details.sf$Entire_Home == "TRUE", 1, 0)
The correlation matrix below in Figure 6 shows the included numerical variables in our final model. These included a combination of the variables in the Airbnb dataset as well as the engineered features. Figure 7 looks at three of the significant engineered features, including average distance to the three nearest historical buildings, average price of three nearest rentals, and the existance of coded language affiliated with cheaper properties.
Vars <- listings_details.sf %>%
dplyr::select(price2,
accommodates, bathrooms, bedrooms, availability_365, number_of_reviews, review_scores_rating, markets_nn1, metrostops_nn2, playgrounds_nn2, histbuild_nn3, monuments_nn3, restaurant_nn3, university_nn1, retail_nn3, industrial_nn1, lagPrice, sig_cell, minimum_nights, students_nn2, wallart_nn3, schools_nn2, retail_nn3, industrial_nn1, lagPrice, sig_cell, beds, review_scores_accuracy, review_scores_cleanliness, review_scores_rating, review_scores_communication, review_scores_location, review_scores_value, reviews_per_month, guests_included) %>%
na.omit()
Vars <- st_drop_geometry(Vars)
ggcorrplot(
round(cor(Vars), 1),
p.mat = cor_pmat(Vars),
colors = c("#037499","light grey","#f6b492"),
type="lower",
insig = "blank") +
labs(title = "Correlation across numeric variables", subtitle = "Figure 6") +
plotTheme() + theme(axis.text.x = element_text(angle = 45, size=25, hjust = 1),
axis.text.y = element_text(size = 25),
plot.title = element_text(size = 30),
plot.subtitle = element_text(size = 20))
Hist_build_plot <- st_drop_geometry(listings_details.sf) %>%
dplyr::select(price2, histbuild_nn3) %>%
filter(price2 <= 1000000) %>%
gather(Variable, Value, -price2) %>%
ggplot(aes(Value, price2)) +
geom_point(shape = 16, size = 1, color= "#72b7cd", alpha = 0.7) + geom_smooth(method = "lm", se=F, colour = "#E86E23") +
facet_wrap(~Variable, ncol = 3, scales = "free") +
plotTheme()
lagPrice_plot <- st_drop_geometry(listings_details.sf) %>%
dplyr::select(price2, lagPrice) %>%
filter(price2 <= 1000000) %>%
gather(Variable, Value, -price2) %>%
ggplot(aes(Value, price2)) +
geom_point(shape = 16, size = 1,color= "#cde1b1", alpha = 0.5) + geom_smooth(method = "lm", se=F, colour = "#E86E23") +
facet_wrap(~Variable, ncol = 4, scales = "free") +
plotTheme()
cheapcodes_plot <- st_drop_geometry(listings_details.sf) %>%
dplyr::select(price2, cheapcodes) %>%
filter(price2 <= 1000000) %>%
gather(Variable, Value, -price2) %>%
ggplot(aes(Value, price2)) +
geom_point(shape = 16, size = 1,color= "#f6c192", alpha = 0.5) + geom_smooth(method = "lm", se=F, colour = "#E86E23") +
facet_wrap(~Variable, ncol = 3, scales = "free") +
plotTheme()
grid.arrange(Hist_build_plot, lagPrice_plot, cheapcodes_plot, ncol=3, top = "Price as a function of average distance to three nearest historical buildings, average price of three nearest rentals, and the existence of coded language affiliated with cheaper properties", bottom = "Figure 7")
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 51.9234565 | 37.7657583 | 1.3748819 | 0.1691877 |
| property_typeApartment | -30.1482621 | 34.4195089 | -0.8759062 | 0.3810944 |
| property_typeBed and breakfast | -28.8754265 | 34.5698494 | -0.8352778 | 0.4035741 |
| property_typeBoat | -19.3271374 | 34.5802629 | -0.5589066 | 0.5762334 |
| property_typeBoutique hotel | -12.1011289 | 36.8283910 | -0.3285815 | 0.7424764 |
| property_typeBungalow | -16.1572398 | 40.5445463 | -0.3985059 | 0.6902628 |
| property_typeCabin | -5.5548055 | 39.2372998 | -0.1415695 | 0.8874219 |
| property_typeCasa particular (Cuba) | -55.3502898 | 48.5888842 | -1.1391554 | 0.2546558 |
| property_typeChalet | -28.8919435 | 54.3463682 | -0.5316260 | 0.5949926 |
| property_typeCondominium | -22.1900339 | 34.6055817 | -0.6412270 | 0.5213846 |
| property_typeCottage | -21.3745878 | 39.8197358 | -0.5367838 | 0.5914246 |
| property_typeGuest suite | -25.9389283 | 34.7931058 | -0.7455192 | 0.4559691 |
| property_typeGuesthouse | 1.4852115 | 36.1777471 | 0.0410532 | 0.9672540 |
| property_typeHostel | -78.0002166 | 50.9585932 | -1.5306587 | 0.1258739 |
| property_typeHotel | 8.8917324 | 42.1051789 | 0.2111791 | 0.8327503 |
| property_typeHouse | -24.4458320 | 34.4783705 | -0.7090194 | 0.4783230 |
| property_typeHouseboat | 6.0039978 | 34.6755871 | 0.1731477 | 0.8625376 |
| property_typeLoft | 3.9072278 | 34.5702994 | 0.1130227 | 0.9100140 |
| property_typeOther | -31.4334659 | 35.8915960 | -0.8757890 | 0.3811582 |
| property_typeServiced apartment | 21.2355818 | 35.4832759 | 0.5984673 | 0.5495368 |
| property_typeTiny house | -32.0456913 | 45.6052324 | -0.7026758 | 0.4822683 |
| property_typeTownhouse | -10.6583706 | 34.5273543 | -0.3086935 | 0.7575588 |
| property_typeVilla | -4.2348765 | 36.5521097 | -0.1158586 | 0.9077661 |
| accommodates | 18.6472397 | 0.6950072 | 26.8302827 | 0.0000000 |
| bathrooms | 3.2206258 | 0.5549017 | 5.8039569 | 0.0000000 |
| bedrooms | 22.0929726 | 0.9417007 | 23.4607167 | 0.0000000 |
| neighbourhoodBanne Buiksloot | -12.5030324 | 11.4302664 | -1.0938531 | 0.2740362 |
| neighbourhoodBos en Lommer | -15.6580507 | 5.4818534 | -2.8563425 | 0.0042912 |
| neighbourhoodBuiksloterham | -30.5488617 | 10.0650285 | -3.0351491 | 0.0024081 |
| neighbourhoodBuikslotermeer | -21.8410583 | 10.2150054 | -2.1381348 | 0.0325212 |
| neighbourhoodBuitenveldert-Oost | -34.7145803 | 10.9203703 | -3.1788831 | 0.0014813 |
| neighbourhoodBuitenveldert-West | -30.6230575 | 8.7555958 | -3.4975413 | 0.0004709 |
| neighbourhoodDe Pijp | -5.3031527 | 4.5599270 | -1.1629907 | 0.2448510 |
| neighbourhoodDe Wallen | 22.2293171 | 5.2476865 | 4.2360223 | 0.0000229 |
| neighbourhoodFrederik Hendrikbuurt | -7.8029454 | 5.6104282 | -1.3907932 | 0.1643079 |
| neighbourhoodGrachtengordel | 1.7767408 | 4.1723203 | 0.4258400 | 0.6702303 |
| neighbourhoodHoofddorppleinbuurt | -23.0442186 | 5.9651410 | -3.8631473 | 0.0001124 |
| neighbourhoodIJplein en Vogelbuurt | -28.3913424 | 7.2077893 | -3.9389806 | 0.0000822 |
| neighbourhoodIndische Buurt | -16.4528140 | 5.0670328 | -3.2470313 | 0.0011686 |
| neighbourhoodJordaan | 0.3452296 | 4.6165440 | 0.0747810 | 0.9403899 |
| neighbourhoodKadoelen | -16.1939612 | 17.3256121 | -0.9346834 | 0.3499659 |
| neighbourhoodLandelijk Noord | -40.9608341 | 14.0701949 | -2.9111774 | 0.0036058 |
| neighbourhoodMuseumkwartier | 4.5846795 | 5.7742546 | 0.7939864 | 0.4272154 |
| neighbourhoodNieuwendam-Noord | -43.8522503 | 12.5471704 | -3.4949912 | 0.0004754 |
| neighbourhoodNieuwendammerdijk en Buiksloterdijk | -42.0188355 | 13.3816237 | -3.1400401 | 0.0016924 |
| neighbourhoodNieuwendammerham | -3.0199954 | 23.7109842 | -0.1273669 | 0.8986516 |
| neighbourhoodNieuwmarkt en Lastage | -0.8774195 | 5.4182284 | -0.1619385 | 0.8713564 |
| neighbourhoodOost | -3.5501865 | 6.7045959 | -0.5295154 | 0.5964555 |
| neighbourhoodOostelijke Eilanden en Kadijken | -20.2432887 | 5.9611171 | -3.3958884 | 0.0006858 |
| neighbourhoodOosterparkbuurt | -14.2628639 | 4.9976308 | -2.8539251 | 0.0043239 |
| neighbourhoodOostzanerwerf | -16.6012850 | 18.9222381 | -0.8773426 | 0.3803140 |
| neighbourhoodOsdorp | -38.3159116 | 11.3125357 | -3.3870312 | 0.0007083 |
| neighbourhoodOud-West | -13.3910917 | 4.3047785 | -3.1107504 | 0.0018695 |
| neighbourhoodOud-Zuid | -10.3931262 | 6.0285815 | -1.7239754 | 0.0847320 |
| neighbourhoodOvertoomse Veld | -20.7560277 | 7.6631425 | -2.7085530 | 0.0067651 |
| neighbourhoodRivierenbuurt | -5.5851324 | 5.7191437 | -0.9765679 | 0.3287982 |
| neighbourhoodSlotermeer-Noordoost | -37.5116651 | 10.3687880 | -3.6177483 | 0.0002981 |
| neighbourhoodSlotermeer-Zuidwest | -28.5622169 | 10.1934081 | -2.8020282 | 0.0050845 |
| neighbourhoodSlotervaart | -29.5165934 | 7.7865523 | -3.7907141 | 0.0001508 |
| neighbourhoodSpaarndammer en Zeeheldenbuurt | -28.5130577 | 6.0192055 | -4.7370135 | 0.0000022 |
| neighbourhoodStadionbuurt | -23.9118296 | 6.7280896 | -3.5540296 | 0.0003805 |
| neighbourhoodTuindorp Buiksloot | -37.4024176 | 11.3811219 | -3.2863559 | 0.0010172 |
| neighbourhoodTuindorp Nieuwendam | -53.9716424 | 12.3060500 | -4.3857812 | 0.0000116 |
| neighbourhoodTuindorp Oostzaan | -39.5627108 | 12.6761103 | -3.1210450 | 0.0018054 |
| neighbourhoodVolewijck | -36.2208164 | 8.5931260 | -4.2150920 | 0.0000251 |
| neighbourhoodWatergraafsmeer | -16.3057194 | 6.1917760 | -2.6334479 | 0.0084606 |
| neighbourhoodWeesperbuurt en Plantage | -8.2474445 | 5.7059065 | -1.4454223 | 0.1483593 |
| neighbourhoodWestelijke Eilanden | -9.8739618 | 6.1419949 | -1.6076148 | 0.1079397 |
| neighbourhoodZeeburg | -13.1612594 | 6.2880850 | -2.0930473 | 0.0363610 |
| room_typePrivate room | -34.4060600 | 1.5243603 | -22.5708192 | 0.0000000 |
| room_typeShared room | -52.0929819 | 8.6419605 | -6.0279125 | 0.0000000 |
| availability_365 | 0.1392824 | 0.0051852 | 26.8616629 | 0.0000000 |
| number_of_reviews | -0.1374698 | 0.0139222 | -9.8741435 | 0.0000000 |
| minimum_nights | -0.1203122 | 0.0373444 | -3.2216931 | 0.0012770 |
| markets_nn1 | -0.0023564 | 0.0021340 | -1.1042259 | 0.2695121 |
| metrostops_nn2 | 0.0018127 | 0.0042383 | 0.4276859 | 0.6688857 |
| students_nn2 | -0.0022773 | 0.0022395 | -1.0169033 | 0.3092151 |
| wallart_nn3 | -0.0090397 | 0.0022491 | -4.0191870 | 0.0000587 |
| playgrounds_nn2 | 0.0056290 | 0.0044678 | 1.2599002 | 0.2077241 |
| histbuild_nn3 | -0.0010773 | 0.0022068 | -0.4881697 | 0.6254365 |
| monuments_nn3 | -0.0079749 | 0.0029689 | -2.6861221 | 0.0072363 |
| restaurant_nn3 | -0.0034376 | 0.0057245 | -0.6005013 | 0.5481809 |
| university_nn1 | -0.0012755 | 0.0018495 | -0.6896397 | 0.4904310 |
| schools_nn2 | 0.0027790 | 0.0033050 | 0.8408243 | 0.4004592 |
| retail_nn3 | 0.0013180 | 0.0019848 | 0.6640297 | 0.5066811 |
| industrial_nn1 | 0.0031671 | 0.0035054 | 0.9034874 | 0.3662811 |
| luxuryyes | 32.7730587 | 2.3770997 | 13.7869935 | 0.0000000 |
| lagPrice | 0.0213229 | 0.0134636 | 1.5837371 | 0.1132737 |
| canalyes | 13.8005572 | 2.0078416 | 6.8733296 | 0.0000000 |
| expamenyes | 6.1278234 | 1.1678133 | 5.2472627 | 0.0000002 |
| expcodesyes | 0.0108277 | 1.2024254 | 0.0090048 | 0.9928154 |
| citycenterdescyes | -1.3403139 | 1.2679790 | -1.0570473 | 0.2905063 |
| cheapcodesyes | -1.8720373 | 1.0649445 | -1.7578731 | 0.0787886 |
| poolyes | 25.9880724 | 7.9468232 | 3.2702467 | 0.0010769 |
| sig_cell | -0.0140910 | 0.0024157 | -5.8331489 | 0.0000000 |
| beds | -0.0480557 | 0.6936357 | -0.0692809 | 0.9447669 |
| review_scores_accuracy | 1.0814071 | 1.0053952 | 1.0756040 | 0.2821210 |
| review_scores_cleanliness | 3.3462583 | 0.7592616 | 4.4072533 | 0.0000105 |
| review_scores_rating | 0.8113450 | 0.1323486 | 6.1303620 | 0.0000000 |
| review_scores_communication | -0.2744106 | 1.0412383 | -0.2635426 | 0.7921359 |
| review_scores_location | 2.0546188 | 0.8457447 | 2.4293604 | 0.0151366 |
| review_scores_value | -6.3871075 | 0.8498631 | -7.5154545 | 0.0000000 |
| reviews_per_month | -0.8801721 | 0.4928991 | -1.7857044 | 0.0741665 |
| expamencatyes | -2.5325977 | 7.0642992 | -0.3585066 | 0.7199691 |
| expsummaryyes | 2.1598107 | 1.2765303 | 1.6919384 | 0.0906776 |
| expdescripyes | -16.8050807 | 8.0255362 | -2.0939511 | 0.0362804 |
| expneighbdescyes | -3.9033671 | 2.0130022 | -1.9390774 | 0.0525098 |
| expneighwoclouyes | 8.1577218 | 3.8724376 | 2.1066116 | 0.0351671 |
| guests_included | 0.2434092 | 0.6580766 | 0.3698797 | 0.7114771 |
| cancellation_policymoderate | 0.4397150 | 1.3228215 | 0.3324069 | 0.7395865 |
| cancellation_policystrict_14_with_grace_period | 4.5737599 | 1.3349542 | 3.4261549 | 0.0006138 |
| cancellation_policysuper_strict_60 | 73.7729522 | 12.9140336 | 5.7126189 | 0.0000000 |
| size_of_groupSolo | 55.2314752 | 5.8855930 | 9.3841818 | 0.0000000 |
| expneighborsCheap Neighbors | 2.7660931 | 1.9681803 | 1.4054064 | 0.1599203 |
| expneighborsExpensive Neighbors | 4.4682453 | 3.0002041 | 1.4893138 | 0.1364249 |
| hotspotlevelNot Hot Spot | 3.6609692 | 3.4940747 | 1.0477650 | 0.2947631 |
| hotspotlevelSemi-Hot | -3.0407486 | 2.4246416 | -1.2541023 | 0.2098235 |
| R2 | 0.5063 |
| Adjusted R2 | 0.5026 |
We trained and tested our model on the selected variables to ensure both accuracy and generalizability. We divided our rental property data and features into separate training and test sets. We used the training set to test the generalizability of the model by accurately predicting home sale prices on a different set of data.
The summary output of the model provides both the significance values for each of the utilized variables to test prices as well as statistics that inform how accurate the model is in predicting price. The p-value, provided for each variable, indicates the confidence level that the variable is a good predictor of home price.| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 64.6960647 | 38.3116586 | 1.6886783 | 0.0913096 |
| property_typeApartment | -31.4201420 | 33.7569507 | -0.9307755 | 0.3519903 |
| property_typeBed and breakfast | -27.9788486 | 33.9601906 | -0.8238720 | 0.4100303 |
| property_typeBoat | -13.8143051 | 33.9872542 | -0.4064555 | 0.6844159 |
| property_typeBoutique hotel | -13.4682739 | 36.3689677 | -0.3703232 | 0.7111488 |
| property_typeBungalow | -26.9728296 | 43.0706535 | -0.6262461 | 0.5311666 |
| property_typeCabin | -6.3321886 | 40.4134428 | -0.1566852 | 0.8754958 |
| property_typeCasa particular (Cuba) | -94.7458609 | 53.1712187 | -1.7819012 | 0.0747931 |
| property_typeChalet | -2.7967574 | 67.5484015 | -0.0414038 | 0.9669748 |
| property_typeCondominium | -26.4536970 | 34.0213846 | -0.7775609 | 0.4368448 |
| property_typeCottage | -22.9958648 | 39.0854019 | -0.5883492 | 0.5563101 |
| property_typeGuest suite | -30.5828082 | 34.2931625 | -0.8918048 | 0.3725172 |
| property_typeGuesthouse | -8.6336686 | 36.0440428 | -0.2395311 | 0.8106983 |
| property_typeHostel | -114.5381795 | 60.6833263 | -1.8874737 | 0.0591231 |
| property_typeHotel | -24.6035491 | 47.6100260 | -0.5167724 | 0.6053255 |
| property_typeHouse | -25.1949104 | 33.8393420 | -0.7445449 | 0.4565629 |
| property_typeHouseboat | 1.6712511 | 34.0935821 | 0.0490195 | 0.9609046 |
| property_typeLoft | 4.1207538 | 33.9837463 | 0.1212566 | 0.9034900 |
| property_typeOther | -32.5173016 | 36.0702847 | -0.9014983 | 0.3673433 |
| property_typeServiced apartment | 37.3088152 | 35.4784286 | 1.0515915 | 0.2930102 |
| property_typeTiny house | -42.8674221 | 47.7616414 | -0.8975282 | 0.3694569 |
| property_typeTownhouse | -14.2954186 | 33.9014517 | -0.4216757 | 0.6732700 |
| property_typeVilla | -10.5454421 | 36.6773486 | -0.2875192 | 0.7737202 |
| accommodates | 17.5730971 | 0.8246172 | 21.3106128 | 0.0000000 |
| bathrooms | 2.2138821 | 0.5536150 | 3.9989564 | 0.0000640 |
| bedrooms | 20.5644041 | 1.1019322 | 18.6621325 | 0.0000000 |
| neighbourhoodBanne Buiksloot | -20.5748334 | 13.2997946 | -1.5470038 | 0.1218912 |
| neighbourhoodBos en Lommer | -18.8590901 | 6.4776727 | -2.9113991 | 0.0036054 |
| neighbourhoodBuiksloterham | -34.7481228 | 12.1581348 | -2.8580143 | 0.0042711 |
| neighbourhoodBuikslotermeer | -22.7318428 | 12.0442862 | -1.8873549 | 0.0591391 |
| neighbourhoodBuitenveldert-Oost | -37.3753546 | 11.9136498 | -3.1371876 | 0.0017103 |
| neighbourhoodBuitenveldert-West | -37.8207036 | 10.2647080 | -3.6845377 | 0.0002302 |
| neighbourhoodDe Pijp | -8.0031599 | 5.3761751 | -1.4886345 | 0.1366125 |
| neighbourhoodDe Wallen | 23.2543062 | 6.1897867 | 3.7568833 | 0.0001729 |
| neighbourhoodFrederik Hendrikbuurt | -10.4328591 | 6.6423240 | -1.5706640 | 0.1162896 |
| neighbourhoodGrachtengordel | 0.2981652 | 4.9282740 | 0.0605009 | 0.9517578 |
| neighbourhoodHoofddorppleinbuurt | -27.0317242 | 7.0761446 | -3.8201204 | 0.0001341 |
| neighbourhoodIJplein en Vogelbuurt | -33.3950238 | 8.4611773 | -3.9468531 | 0.0000797 |
| neighbourhoodIndische Buurt | -16.4615391 | 6.0078864 | -2.7399884 | 0.0061541 |
| neighbourhoodJordaan | -3.4215418 | 5.4632509 | -0.6262831 | 0.5311423 |
| neighbourhoodKadoelen | -36.3846610 | 20.5315682 | -1.7721326 | 0.0764004 |
| neighbourhoodLandelijk Noord | -43.4223657 | 16.0539720 | -2.7047740 | 0.0068457 |
| neighbourhoodMuseumkwartier | -5.7958781 | 6.8075719 | -0.8513870 | 0.3945731 |
| neighbourhoodNieuwendam-Noord | -50.1373883 | 15.0498510 | -3.3314209 | 0.0008669 |
| neighbourhoodNieuwendammerdijk en Buiksloterdijk | -47.9557044 | 15.7394211 | -3.0468531 | 0.0023180 |
| neighbourhoodNieuwendammerham | 3.6088717 | 29.7954380 | 0.1211216 | 0.9035969 |
| neighbourhoodNieuwmarkt en Lastage | -5.8685231 | 6.4487998 | -0.9100179 | 0.3628331 |
| neighbourhoodOost | -5.2940857 | 7.8822651 | -0.6716452 | 0.5018238 |
| neighbourhoodOostelijke Eilanden en Kadijken | -21.2208410 | 7.0690946 | -3.0019178 | 0.0026889 |
| neighbourhoodOosterparkbuurt | -16.0510968 | 5.9009365 | -2.7200931 | 0.0065367 |
| neighbourhoodOostzanerwerf | -29.1041629 | 21.7257004 | -1.3396191 | 0.1803970 |
| neighbourhoodOsdorp | -42.7509903 | 13.3295890 | -3.2072249 | 0.0013441 |
| neighbourhoodOud-West | -13.9312778 | 5.0995897 | -2.7318429 | 0.0063082 |
| neighbourhoodOud-Zuid | -13.3792564 | 7.0195993 | -1.9059858 | 0.0566782 |
| neighbourhoodOvertoomse Veld | -19.9844074 | 8.9498004 | -2.2329445 | 0.0255728 |
| neighbourhoodRivierenbuurt | -8.8682756 | 6.7506646 | -1.3136893 | 0.1889783 |
| neighbourhoodSlotermeer-Noordoost | -36.7652321 | 12.0586338 | -3.0488721 | 0.0023025 |
| neighbourhoodSlotermeer-Zuidwest | -40.3413443 | 12.3385486 | -3.2695372 | 0.0010806 |
| neighbourhoodSlotervaart | -28.0883357 | 9.2740588 | -3.0286993 | 0.0024618 |
| neighbourhoodSpaarndammer en Zeeheldenbuurt | -31.5662524 | 7.0780863 | -4.4597157 | 0.0000083 |
| neighbourhoodStadionbuurt | -23.1839545 | 7.8334137 | -2.9596234 | 0.0030868 |
| neighbourhoodTuindorp Buiksloot | -45.5347110 | 13.2057569 | -3.4480955 | 0.0005667 |
| neighbourhoodTuindorp Nieuwendam | -49.1521909 | 14.8833481 | -3.3024955 | 0.0009614 |
| neighbourhoodTuindorp Oostzaan | -54.1052297 | 14.8948352 | -3.6324826 | 0.0002820 |
| neighbourhoodVolewijck | -34.2436055 | 10.2162711 | -3.3518693 | 0.0008054 |
| neighbourhoodWatergraafsmeer | -19.4997139 | 7.2088683 | -2.7049619 | 0.0068418 |
| neighbourhoodWeesperbuurt en Plantage | -8.4493723 | 6.7581441 | -1.2502504 | 0.2112348 |
| neighbourhoodWestelijke Eilanden | -16.1831981 | 7.3061649 | -2.2150059 | 0.0267802 |
| neighbourhoodZeeburg | -14.3198893 | 7.3431966 | -1.9500893 | 0.0511910 |
| room_typePrivate room | -33.2564064 | 1.7811563 | -18.6712453 | 0.0000000 |
| room_typeShared room | -57.0871743 | 9.9161076 | -5.7570144 | 0.0000000 |
| availability_365 | 0.1382717 | 0.0060919 | 22.6977063 | 0.0000000 |
| number_of_reviews | -0.1280636 | 0.0164372 | -7.7910949 | 0.0000000 |
| minimum_nights | -0.1568261 | 0.0495749 | -3.1634158 | 0.0015636 |
| markets_nn1 | -0.0011827 | 0.0025011 | -0.4728608 | 0.6363219 |
| metrostops_nn2 | 0.0044905 | 0.0049318 | 0.9105080 | 0.3625747 |
| students_nn2 | -0.0017175 | 0.0026028 | -0.6598707 | 0.5093507 |
| wallart_nn3 | -0.0089450 | 0.0026235 | -3.4095056 | 0.0006532 |
| playgrounds_nn2 | 0.0108027 | 0.0052286 | 2.0660540 | 0.0388469 |
| histbuild_nn3 | -0.0012133 | 0.0025760 | -0.4709884 | 0.6376584 |
| monuments_nn3 | -0.0084732 | 0.0034659 | -2.4447569 | 0.0145106 |
| restaurant_nn3 | -0.0045769 | 0.0066834 | -0.6848188 | 0.4934728 |
| university_nn1 | -0.0006791 | 0.0021630 | -0.3139621 | 0.7535558 |
| schools_nn2 | -0.0031132 | 0.0038449 | -0.8097174 | 0.4181202 |
| retail_nn3 | 0.0029055 | 0.0023326 | 1.2456419 | 0.2129226 |
| industrial_nn1 | 0.0024174 | 0.0040976 | 0.5899392 | 0.5552436 |
| luxuryyes | 28.6881468 | 2.7570447 | 10.4053977 | 0.0000000 |
| lagPrice | 0.0265946 | 0.0158534 | 1.6775270 | 0.0934680 |
| canalyes | 11.3983534 | 2.3607928 | 4.8281888 | 0.0000014 |
| expamenyes | 5.3517563 | 1.3694240 | 3.9080345 | 0.0000936 |
| expcodesyes | -0.4872572 | 1.4109252 | -0.3453459 | 0.7298410 |
| citycenterdescyes | -1.7537299 | 1.4891194 | -1.1776959 | 0.2389435 |
| cheapcodesyes | -2.6772635 | 1.2502492 | -2.1413840 | 0.0322651 |
| poolyes | 21.7950496 | 9.6114645 | 2.2676096 | 0.0233724 |
| sig_cell | -0.0147106 | 0.0028141 | -5.2273638 | 0.0000002 |
| beds | 2.9734691 | 0.8456868 | 3.5160405 | 0.0004398 |
| review_scores_accuracy | 1.2845762 | 1.1772055 | 1.0912081 | 0.2752053 |
| review_scores_cleanliness | 2.8346929 | 0.8791821 | 3.2242387 | 0.0012668 |
| review_scores_rating | 0.8395258 | 0.1532516 | 5.4780870 | 0.0000000 |
| review_scores_communication | -0.3453102 | 1.2328056 | -0.2801011 | 0.7794052 |
| review_scores_location | 1.8173183 | 0.9795064 | 1.8553409 | 0.0635744 |
| review_scores_value | -5.8769957 | 0.9932434 | -5.9169740 | 0.0000000 |
| reviews_per_month | -1.3353345 | 0.5753636 | -2.3208533 | 0.0203130 |
| expamencatyes | -9.4729312 | 8.2304316 | -1.1509641 | 0.2497722 |
| expsummaryyes | 2.5047093 | 1.4977348 | 1.6723317 | 0.0944875 |
| expdescripyes | -21.1388224 | 9.3696186 | -2.2561028 | 0.0240839 |
| expneighbdescyes | -2.5734512 | 2.3710858 | -1.0853471 | 0.2777919 |
| expneighwoclouyes | 7.9299603 | 4.5496874 | 1.7429682 | 0.0813673 |
| guests_included | -0.3049352 | 0.7859769 | -0.3879696 | 0.6980461 |
| cancellation_policymoderate | -0.2415770 | 1.5539666 | -0.1554583 | 0.8764629 |
| cancellation_policystrict_14_with_grace_period | 4.9086469 | 1.5660670 | 3.1343786 | 0.0017267 |
| cancellation_policysuper_strict_60 | 59.4161295 | 14.8311879 | 4.0061612 | 0.0000621 |
| size_of_groupSolo | 56.6031507 | 7.0124621 | 8.0717942 | 0.0000000 |
| expneighborsCheap Neighbors | 3.8931421 | 2.2988212 | 1.6935385 | 0.0903815 |
| expneighborsExpensive Neighbors | 6.5321470 | 3.5321645 | 1.8493326 | 0.0644368 |
| hotspotlevelNot Hot Spot | 3.2473808 | 4.0911219 | 0.7937629 | 0.4273507 |
| hotspotlevelSemi-Hot | -3.7868239 | 2.8549585 | -1.3264024 | 0.1847341 |
| R2 | 0.516 |
| Adjusted R2 | 0.5109 |
This section looks at both model accuracy and generalizability. As this analysis will indicate, our predictions are useful in showing how well our model does in accurately predicting rental price in our test set. Our k-fold cross validation specifically addresses how accurately our model predicts on new data and generalizes across holdout test sets. In turn, understanding our model’s accuracy and generalizability is critical to ensuring that Checkbnb users get an accurate predicted rental price no matter their home specifications or location.
| Regression | MAE | MAPE |
|---|---|---|
| Baseline Regression | 37.94078 | 0.2566976 |
The plots below show how well our model is predicting rental prices in both our training and test set data. The orange line indicates a perfect fit, meaning that our predicted rent prices match those of the actual values. As seen in the bottom right scatterplot, we are almost perfectly predicting rental prices in accordance with their actual values in our training data. Our model does slightly underpredict on higher priced rentals in our testing data set.
preds %>%
mutate(price_Decile = ntile(actual, 10)) %>%
group_by( price_Decile) %>%
summarize(meanObserved = mean(actual, na.rm=T),
meanPrediction = mean(pred, na.rm=T)) %>%
gather(Variable, Value, -price_Decile) %>%
ggplot(aes(price_Decile, Value, shape = Variable)) +
geom_point(size = 2) + geom_path(aes(group = price_Decile), colour = "black") +
scale_shape_manual(values = c(2, 17)) +
labs(title = "Predicted and observed Price by observed Price Decile")
We then looked to test the generalizability of our model using a cross-validation test. In this analysis, we split our data into 100 groups - one group acts as the test set with the remaining 99 groups acting as the training set. This process is repeated for each individual group, resulting in 100 “scores” telling us how well our model predicted for each sample of new data.
Our average MAE of $38 is similar to the MAE of our initial test, and our standard deviation of $3.91 suggests there’s not significant variation across our 100 groups. The histogram in Figure 9 below confirms the minimal variation showing a relatively narrow distribution of errors. This indicates that our model is relatively generalizable to new rental price data.
fitControl <- trainControl(method = "cv",
number = 100,
savePredictions = TRUE)
set.seed(717)
reg1.cv <-
train(price2 ~ ., data = st_drop_geometry(rents) %>%
dplyr::select(price2, property_type, accommodates, bathrooms, bedrooms, neighbourhood, room_type, availability_365, number_of_reviews, minimum_nights, Entire_Home, markets_nn1, metrostops_nn2, students_nn2, wallart_nn3, playgrounds_nn2, histbuild_nn3, monuments_nn3, restaurant_nn3, university_nn1, schools_nn2, retail_nn3, industrial_nn1, luxury, lagPrice, canal, expamen, expcodes,citycenterdesc, cheapcodes),
method = "lm",
trControl = fitControl,
na.action = na.pass)
#Standard Deviation and Histogram of MAE
reg1.cv.resample <- reg1.cv$resample
ggplot(reg1.cv.resample, aes(x=MAE)) + geom_histogram(color = "#ff5a5f", fill = "#f6c192", alpha = .7, bins = 50) +
labs(title="Histogram of Mean Average Error Across 100 Folds",
subtitle = "Figure 9") +
plotTheme()
cv_preds <- reg1.cv$pred
map_preds <- rents %>%
rowid_to_column(var = "rowIndex") %>%
left_join(cv_preds, by = "rowIndex") %>%
mutate(rentPrice.AbsError = abs(pred - obs),
PercentError = (rentPrice.AbsError / price2)*100)
map_preds_new2 <- st_join(neighborhoods_new.sf, map_preds, left = TRUE)
map_preds_sum <- map_preds_new2 %>%
group_by(neighbourhood.x) %>%
summarise(meanMAE = mean(rentPrice.AbsError),
meanMAPE = mean(PercentError))
map_preds_sum %>%
group_by(neighbourhood.x) %>%
st_sf() %>%
ggplot() +
geom_sf(data = map_preds_sum, fill = "grey40") +
geom_sf(aes(fill = meanMAPE)) +
scale_fill_gradient(low = paletteblues[1], high = paletteblues[5],
name = "meanMAPE") +
labs(title = "Mean MAPE by neighborhood",
subtitle = "Figure 11") +
mapTheme()
To optimize the user experience for the Checkbnb customer, we successfully honed a relatively accurate and generalizable model that minimizes price error. Our goal at Checkbnb is for each user to get an accurate predicted price for their individual home so that they can appropriately list and rent their units in a timely fashion. We were able to use and engineer a diverse set of variables that accounts for the spatial process at the neighborhood scale and overall variation in price. Under a stringent and complicated regulatory landscape, this model provides a tool that helps users effectively navigate the Airbnb rental market that is otherwise less accessible in Amsterdam. Airbnb wants to make the prospect of becoming a host as simple and streamlined as possible, and we believe our model effectively achieves this goal.
To improve the analysis, more up-to-date Airbnb data is required since we don’t know how the regulatory changes that the City has undertaken may have impacted the rental market. For example, since the city banned Airbnbs from three city center neighborhoods, there may be implications on increased rental prices as a result of a strained market. There is also opportunity to further utilize neighborhood demographics, socioeconomic indicators, and other census variables that we were unable to acquire for this particular study. Access to this data would allow us to better generalize across different neighborhood contexts given and pinpoint any spatial biases in our model.